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Abstract 

A consistent query answer in an inconsistent database is an answer 
obtained in every (minimal) repair. The repairs are obtained by re- 
solving all conflicts in all possible ways. Often, however, the user is 
able to provide a preference on how conflicts should be resolved. We 
investigate here the framework of preferred consistent query answers, 
in which user preferences are used to narrow down the set of repairs 
to a set of preferred repairs. We axiomatize desirable properties of 
preferred repairs. We present three different families of preferred re- 
pairs and study their mutual relationships. Finally, we investigate the 
complexity of preferred repairing and computing preferred consistent 
query answers. 
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1 Introduction 



In many novel database applications, violations of integrity constraints can- 
not be avoided. A typical example is integration of two consistent data 
sources that contribute conflicting information. Inconsistencies also often 
occur in the context of long running operations. Finally, integrity enforce- 
ment may be disabled because of efficiency considerations. Integrity con- 
straints, however, capture important semantic properties of the stored data. 
These properties directly influence the way a user formulates a query. Eval- 
uation of the query over an inconsistent database may yield answers that 
are meaningless or misleading. 

The framework of repairs and consistent query answers [4J has been pro- 
posed to offset the impact of inconsistencies on the accuracy of query an- 
swers. A repair is a consistent database minimally different from the given 
one, and a consistent answer to a query is an answer present in every repair. 
This approach does not physically remove any facts from the database. The 
framework of [3] has served as a foundation for most of the subsequent work 
in the area of querying inconsistent databases (for the surveys of the area 
see [71 Emu E], other works include f26J). 

Recently, the problem of database repairing has received an enlivened 
interest [21 US]. Essentially, the goal is to construct a repair of a possibly in- 
consistent instance by resolving every conflict present in the given instance. 
In the case of denial constraints, the class of constraints we consider in this 
paper, a conflict is simply a set of facts that are present in the given instance 
and together violate a constraint. A resolution of a conflict is the deletion 
of one of the facts creating the conflict. Typically, there exists more than 
one repair and a repairing algorithm needs to make some nondeterministic 
choices when repairing the database instance. It is desirable for the algo- 
rithm to be sound, i.e. always producing a repair, that is an instance that 
in not only consistent but also minimally different from the given one. It 
is even more desirable for the algorithm to be complete, i.e. allowing to 
produce every repair, with an appropriate sequence of choices [23j. 




Example 1 Consider the schema consisting of two relations 

Emp{Name, Salary , Dept) and Mgr(Name, Salary, Dept), 
and the set of constraints Fq consisting of 




Emp : Name Name Salary Dept, 

yx,y,z,x',y'. -^[Emp{x,y,z) a Mgr{x',y',z) a y > y']. 
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The first constraint is a key dependency requiring the employee information 
to be associated with the name. The second constraint is a denial constraint 
requiring no employee of a department to earn more than the manager of 
the department. 

Now, consider the inconsistent database instance 

Jo = {Emp{John,$40k, IT), Emp{John,%Ok, IT), 
Emp{John, $80k, IT), Mgr{Mary, $70k, IT)}. 

This instance contains three conflicts w.r.t. the functional dependency and 
one conflict w.r.t. the denial constraint. Iq has three repairs w.r.t. Fq: 

l[ = {Em,p{John,%mk,IT)}, 

12 = {Emp{ John,$50k, PR), Mgr{Mary,$70k, IT)}, 

13 = {Emp{John,$40k,PR),Mgr{Mary,$70k,IT)}. 

Consider the query Qq = 3x, y. Emp( John, x,y) a x > $60A; asking whether 
John earns more than $60A;. The answer to Qi in the database instance Iq is 
true. However, true is not a consistent answer to Qi because of the repairs 
l'2 and /g. □ 

One of the drawbacks of the framework of consistent query answers is 
that it considers all possible ways to resolve the existing conflicts. The user, 
however, may have a preference on what resolutions to consider. Typical 
information used to express the preference includes: 

• the timestamp of creation/last modification of the fact; the conflicts 
can be resolved by removing from consideration old, outdated facts, 

• the source of the fact (in data integration setting); the user can con- 
sider the data from one source more reliable than the data from an- 
other, 

• the data values stored in the conflicting facts. 

To improve the quality of consistent answers we propose extending the 
framework of repairs and consistent query answers with the preference infor- 
mation. We use the preference information to define a set of preferred repairs 
(a subset of all repairs). Query answers obtained in every preferred repair 
are called preferred consistent query answers. For instance, in the previous 
example if the database contains an employee who earns more than her man- 
ager, then we might prefer to remove the information about the employee 
rather than the information about the manager of the department. Then 
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the preferred repairs are I'^ and ig, and consequently, false is the preferred 

consistent answer to Qo- 

We observe, however, that there may be more that one way to select the 
preferred repairs based on the user preference; especially, when a resolution 
of one conflict affects the way another conflict can be resolved. 

Example 2 We take the schema consisting of one relation name 

Mgr{Name, Salary, Dept) 
with two functional dependencies 

Mgr : Name —y Salary Dept and Mgr : Dept — > Name Salary. 
Consider the following inconsistent instance 

h = {Mgr{Bob,$70k,RSzD),Mgr{Mary,$40k,IT),Mgr{Ken,$60k,IT), 
Mgr{Bob, $60k, AD), Mgr{Mary, $50k, PR),Mgr{Ken, $50fe, PR)} 

This instance contains five conflicts: 

1. Mgr{Bob,$70k,RkD) and Mgr {Bob, $60k, AD). 

2. Mgr{Mary,$40k,IT) and Mgr (Mary, $50k, PR), 

3. Mgr{Ken,$60k,IT) and Mgr{Ken,$50k, PR), 

4. Mgr{Mary,$4:0k,IT) and Mgr {Ken, $60k, IT), 

5. Mgr{Mary,$50k,PR) and Mgr {Ken,$50k, PR), 

These conflicts may arise from changes that are not yet fully propagated. 

For instance, Bob may have been moved to manage RSzD department while 
previously being the manager of AD, or Bob may have been moved from 
AD department to Rk,D department. Similarly, Mary may have been pro- 
moted to manage PR whose previous manager was moved to manage IT, 
or conversely, John may have been moved to manage IT, while Mary was 
moved from IT to manage PR. 

The set of repairs of Ii consists of four instances: 

/( = {Mgr{Bob,$70k,RSzD),Mgr{Mary,$50k,PR),Mgr{Ken,$mk,IT)}, 
I'2 = {Mgr{Bob,$70k,RSzD),Mgr{Mary,$40k,IT),Mgr{Ken,$50k,PR)}, 

= {Mgr{Bob,$mk,AD),Mgr{Mary,$A0k,IT),Mgr{Ken,$50k,PR)}, 
Ii = {Mgr{Bob,$60k,AD),Mgr{Mary,$50k,PR),Mgr{Ken,$6Qk,IT)}. 
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Suppose that for a conflict created by two facts referring to the same person 
the user prefers to resolve it by removing the tuples with smaller salary. This 
preference expresses the belief that if a manager is being reassigned, her 
salary is not decreased. This preference applies to the first conflict: the fact 
Mgr{Bob, $70fc, RkD) is preferred over Mgr{Bob, $60k, AD). Similarly, the 
preference applies to the second and the third conflict. It does not apply 
to the last two conflicts as each of them involves facts referring to different 
persons. 

The preference information on resolutions of the first conflict allows us to 
eliminate the last two repairs and I'^. Similarly, by applying the preference 
to the conflicts 2 and 3 we may also eliminate the repair l!^- This leaves us 
with only one preferred repair /{. 

We observe that while the preference applies to conflicts 1, 2, and 3, it 
does not apply to conflicts 4 and 5 because conflicts 4 and 5 involve facts 
about different persons. However, the preferential resolution of conflicts 2 
and 3 implicitly resolves the conflicts 4 and 5, which may not be desirable. 
Consequently, one may find the reasons for eliminating I2 insufficient. □ 

In this paper we consider three different families of preferred repairs. The 
families are based on various notions of compliance of a repair with the user 
preference. For instance, in the previous example the first way of selecting 
preferred repairs is captured by global optimality (Section [4|). On the other 
hand the notion of Pareto optimality (Section [5|) is less restrictive as it 
requires stronger arguments for removing a repair from consideration. In 
the previous example the repairs /{ and ^-I'e Pareto optimal. Conversely, 
the third notion of common optimality (Section [6|) is more restrictive than 
global optimality. In the previous example, however, it coincides with the 
global optimality. 

For every family of preferred repairs we present a repairing algorithm. 
Each of them is sound, i.e. it produces a repair belonging to the correspond- 
ing family of preferred repairs, and complete, i.e. every repair from the 
family of preferred repairs can be constructed using the corresponding re- 
pairing algorithm. For the family of globally optimal repairs and the family 
of Pareto optimal repairs we define two pre-order on repairs whose maximal 
elements are exactly the globally optimal repairs and Pareto optimal repairs 
respectively. It is an open question whether such an order can be defined 
for common optimal repairs. 

We also adapt two basic decision problems: repair checking |13t [2] and 
consistent query answering [3] to obtain preferred repair checking and pre- 
ferred consistent query answering. Basically, preferred repair checking is 
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finding if a given database instance is a preferred repair, and preferred con- 
sistent query answering is finding if an answer to a query is obtained in every 
preferred repair. 

We show that using the notion of global optimality leads to intractabil- 
ity of both preferred consistent query answering, which is Hg-complete, and 
preferred repair checking, which is coNP-compIete. The complexity is re- 
duced if we use the notion of Pareto optimality: preferred repair checking 
is in LOGSPACE and the preferred consistent query answering becomes 
coNP-complete. Using common optimal repair also reduces the complexity: 
preferred consistent query answering is coNP-complete and preferred repair 
checking is in PTIME. It is an open question whether in this case the pre- 
ferred repair checking is PTIME-complete or in LOGSPACE. Finally, we 
identify a tractable case for which preferred consistent query answering is 
in PTIME for every of the aforementioned families of preferred repairs. 

The contributions of this paper are: 

• A formal framework of families of preferred repairs and preferred con- 
sistent query answers for relational databases. 

• A list of desirable properties of families of preferred repairs. 

• Three different families of preferred repairs based on different notions 
of optimal compliance with the user preference. 

• Repairing algorithm for every family of preferred repairs. The algo- 
rithms are both sound and complete. 

• A thorough analysis of computational implications of preferences in 
the context of repairing and consistent query answers. 

The presented work is an extension of [24J. The current paper extends 
the framework of preferred consistent query answers to denial constraints 
(instead of functional dependencies), provides detailed proofs of all claims, 
presents a sound and complete repairing algorithm for every considered fam- 
ily of preferred repairs (instead of the repairing algorithm for the family of 
common optimal repairs only). Additionally, we further broaden the analy- 
sis of computational complexity by identifying a family of preferred repairs 
for which repair checking is in LOGSPACE, potentially offering a possibility 
of parallel implementation for this decision problem. 

The paper is organized as follows. In Section [2] we recall basic notions 
of relational databases and the framework of repairs and consistent query 
answers. In Section [3] we extend this framework with preferences on conflict 
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resolution. In Sections HI O and[6]we present the families of globally, Pareto, 
and common optimal repairs respectively. We investigate their properties 
and mutual relationships, and analyze their computational implications. In 
Section[7]we present a tractable case of preferred consistent query answering. 
Section [5] contains a discussion of related work. Finally, in Section we 
summarize our results and outline directions for future work. 

2 Preliminaries 

In this section we recall the basic notions of relational databases p!] and 
the framework of consistent query answers [4J. A database schema 5 is a 
set of relation names of fixed arity (greater than 0) whose attributes are 
drawn from an infinite set of names U. In the sequel, we will denote relation 
names by -R, P, . . ., elements of U hy A, B,C, .. and finite subsets of U by 
X,Y, Z, . . .. Every element of U is typed but for simplicity we consider only 
two disjoint infinite domains: Q (rationals) and D (uninterpreted constants). 
We assume that two constants are equal if and only if they have the same 
name, and we allow the standard built-in relation symbols = and ^ over 
D. We also allow the built-in relation symbols =, j^, <, ^s^, >, and ^ with 
their natural interpretation over Q. We use these symbols together with 
the vocabulary of relational names S to build a first-order language C An 
>C- formula is: 

• closed (or a sentence) if it has no free variables, 

• ground if it has no variables whatsoever, 

• quantifier-free if it has no quantifiers, 

• atomic if it has no quantifiers and no Boolean connectives. 

Finally, a fact is an atomic ground £-formula. 

Database instances are finite, first-order structures over the schema. Of- 
ten, we will find it more convenient to view an instance I as the finite set of 
all facts satisfied by the instance, i.e. {R{t) \ R s S, I \= R{t)}. 

In the sequel, we will denote tuples of variables by x,y,..., tuples of 
constants hy t, s, . . ., facts hy p,q,r, . . quantifier-free formulas using only 
built-in predicates by (/?, and instances by /, J, . . .. 
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2.1 Integrity constraints 

In general, an integrity constraint is a closed £-formula. In this paper we 
consider the class of denial constraints, >C-sentences of the form 

Vx. ^[Ri{xi) A ... A Rn{Xn) A ip{x)], 

where ^p(x) is a quantifier-free formula referring to built-in relation names 
only and xi u . . . u x„ = x. We also make a natural assumption that ra > 0. 

The class of denial constraints contains functional dependencies (FDs) 
commonly formulated as R: X ^Y, where X and Y are sets of attributes 
of R. An FD R: X ^Y is expressed by the following denial constraint 

yx,yi,y2,z,z'.^[R{x,yi,z) a R{x,y2,z') a -(yi = y2)], 

where x is the vector of variables corresponding to the attributes X, and yi 
and y2 are two vectors of variables corresponding to the attributes Y. A key 
dependency is a functional dependency R : X ^ Y , where Y comprises all 
attributes of R. If the relation name is known from context, for clarity we 
omit it in our notation, i.e. we write X ^Y instead oi R: X ^Y . 
Database consistency is defined in the standard way. 

Definition 1 Given a database instance I and a set of integrity constraints 
F, I is consistent with F \i I \= F m. the standard model-theoretic sense; 
otherwise / is inconsistent. 

We observe that an empty instance satisfies any set of denial constraints. 
This conforms to the behavior of typical SQL database management sys- 
tems: an empty database satisfies any set of constraints expressed in SQL. 
Also, note that denial constraints can be represented using standard SQL 
assertions. We remark, however, the converse is not necessarily the case. 

2.2 Queries 

In this paper we deal only with closed queries, i.e. closed >C-formulas. The 

query answers are Boolean: true or false. A query is atomic {quantifier- 
free) if the £-formula is atomic (quantifier-free respectively). A conjunctive 
query is an existentially quantified conjunction of atomic ^-formulas. 

Definition 2 Given an instance / and a closed query Q, true is the answer 
to Q in J if / 1= Q; otherwise the answer to Q in / is false. 
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2.3 Repairing 

In the original framework when repairing a database two operations are 
considered: inserting a fact and deleting a fact. In the presence of denial 
constraints inserting facts cannot resolve inconsistencies, and thus repairs 
are obtained by deleting facts only, i.e. the repairs are subsets of the original 
instance. 

Definition 3 (Repair) Given an instance / and a set of denial constraints 
F, an instance /' is a repair of I w.r.t. F if and only if /' is a maximal 
subset of I that is consistent with F. By Rep{I, F) we denote the set of all 
repairs of I w.r.t. F. 

To identify the facts whose mutual presence causes inconsistency we use the 
notion of a conflict. 

Definition 4 (Conflict) Given a instance I and a set of denial constraints 
F, a set of facts {Ri{ti), . . . ,i?„(t„)} c / is a conflict in / w.r.t. F if for 
some denial constraint in F of the form 

Vx. -^[Ri{xi) A ... A Rn{Xn) A (p{x)] 

there exists a substitution p of variables x such that ip(p(x)) is valid and 
p(xi) = ti for every i € {1, . . . , n}. 

We recall the notion of a conflict hypergraph that allows to visualize all 
the conflicts present in the instance [5l[l2]- We recall that a hypergraph is 
a generalization of an undirected graph by allowing more than two nodes to 
be connected by a hyperedge. Formally, a hypergraph is a pair consisting of 
a set of nodes and a set of hyperedges, where a hyperedge is a subset of the 
node set. Given a hypergraph G by V{G) we denote its set of nodes, and 
by E{Q) we denote its set of hyperedges. The conflict hypergraph G(I,F) 
of / w.r.t. F is a hypergraph whose set of nodes is / and set of hyperedges 
consists of all conflicts in I w.r.t. F. 

The size of the hypergraph is the size of the node set and the sum of 
cardinalities of all hyperedges. We observe that assuming F to be fixed, the 
maximum cardinality of every hyperedge in a conflict hypergraph is bound 
by a constant. Consequently, the size of a conflict hypergraph Q{I,F) is 
polynomial in the size of the instance I. 

Two nodes are neighboring (or are neighbors) in a hypergraph if there 
exists a hyperedge containing both nodes. The neighborhood of a node v 6 
V{Q) in a hypergraph Q is 

ng{v) = {v' 6 V{g) I 3e 6 E{g). {v,v'} c e}. 
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A hyperedge connecting exactly two nodes is called simply an edge and a 
hypergraph having only edges is called a graph. Similarly, we define the con- 
flict graph. The conflict graph for the instance in Example [T] is in Figure [TJ 
The conflict hypergraph is also a compact representation of all repairs as we 



Mgr{Mary, $70, IT) 





Emp{John, $40fc, IT) 






Emp{John, $80k, IT) 






Em,p{ John, $50fc, IT) 



Figure 1: Conflict graph. 

recall the following fact. 

Proposition 1 (IS 112] ) A maximal independent set ofQ{I, F) is any max- 
imal set of vertices that contains no hyperedge. Any maximal independent 
set is a repair of I w.r.t. F and vice versa. 

We recall that when handling only one key dependency (per relation name), 
the conflict graph is a union of pairwise disjoint cliques and every repair 
consists of exactly one element from each clique [5j. To generalize this 
observation to FDs we assume only one relation name R and one functional 
dependency R : X . Now, given an instance /, an X -cluster is the set of 
all facts (of R) in / that have the same attribute value in X, and similarly, an 
(X, Y)- cluster is the set of all facts (of R) in / that have the same attribute 
value in X and Y . Clearly, an X-cluster is a union of all (X, y)-clusters with 
the same attribute value in X. We recall that every repair contains exactly 
one (X, y)-cluster from each X-cluster. We also remark that conflicts are 
present only inside an X-cluster and two facts from the same X-cluster form 
a conflict if and only if they belong to different {X, y)-clusters. 

Finally, we recall the basic database repairing algorithm [23j. Algorithm[T] 
is both sound., i.e it always produces a repair, and complete, i.e. any repair 
can be produced with it. 

2.4 Complexity classes 

We make use of the following complexity classes: 
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Algorithm 1 Constructing a repair of I w.r.t. F 



l: 7° <- / 

2: J <- 

3: while 1° ^ do 
4: choose R{t) e I" 
5: 7" 

6: if Ju{R{t)} |=Fthen 
7: J^Ju{i?(t)} 

8: return J 



• LOGSPACE: the class of decision problems solvable in logarithmic 
space by deterministic Turing machines (the input tape is read-only) 

• PTIME: the class of decision problems solvable in polynomial time by 
deterministic Turing machines; 

• coNP: the class of decision problems whose complements are solvable 
in polynomial time by nondeterministic Turing machines; 

• 112: the class of decision problems whose complements are solvable 
in polynomial time by nondeterministic Turing machines with an NP 
oracle. 

3 Conflict resolution preferences 

To represent the preference information we use an acyclic relation on pairs 
of neighboring facts, i.e. pairs of facts present in a conflict. 

Definition 5 (Priority) Given an instance I and a set of denial constraints 
F, a priority > of / w.r.t. F is a binary relation on / such that: (1) > is 
acyclic and (2) for every R{t),R'{t') e 7 if R{t) > R'{t'), then R{t) and 
R'{t') are neighbors. 

In the sequel, we omit the reference to the instance 7 and the set of denial 
constraints 7^ if they are known from the context. 

From the point of the user interface it is often more natural to define 
the priority as some acyclic binary relation on facts of 7 and then consider 
the restriction of the priority relation to the conflicting facts. Clearly, this 
approach can be handled with the notion of priorities. 
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To visualize the priority we use the notion of prioritized conflict hy- 
pergraph. Basically, we interpose the conflict hypergraph and the directed 
graph of the priority relation. The examples we present in this paper use 
only conflict graphs, i.e. conflict hypergraphs where edges connect exactly 
to nodes. Consequently, a prioritized graph can be seen a graph with some 
of its edges oriented. For instance, Figure [2] contains the conflict graph for 
the instance in Example [U with the priority corresponding to the preference 
considered subsequently. 





Emp{John, $40fc, IT) 






Emp{John, $80k, IT) 






Em,p{ John, $50fc, IT) 



Figure 2: Prioritized conflict graph. 



Definition 6 (Priority extension) Given an instance /, a set of denial 
constraints F, and two priorities > and >' of / w.r.t. F, >' is an extension 
of >', denoted > c >' if and only if R{t) >' R'{t') whenever R{t) > R'{t') 
for R{t),R'{t') e I. A priority > of / w.r.t. F is total if there exists no 
priority >' of / w.r.t. F that is different from > and extends >. 

Note that a total priority is also acyclic and defined on pairs of neighboring 
facts only. 

Proposition 2 A priority > is total if and only if for every conflict C and 
any two facts xi,X2 ^ C we have that either xi > X2 or X2 > xi. 

Proof The i/part is trivial. For the only if part suppose there is a priority 
> that is total yet there exists neighboring xi and X2 such that xi X2 
and X2 if xi, i.e. both >i = > u {{xi,X2)} and >2 = > u {{x2,xi)} are 
cyclic. Since > is not cyclic, >i has a cycle that traverses (xi,X2), i.e. there 
exists a chain X2 > yi > ■ ■ ■ > Vn > xi. Similarly, >2 being cyclic implies 
that there exists a chain xi > zi > . . . > Zm > X2. Together this implies 
that xi > . . . > a;2 > . . . > xi, i.e. > is cyclic. To finish the proof we 
observe that the acyclicity of priority implicitly excludes the possibility of 
both X > y and y > x being true at the same time for some facts x and y. □ 
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3.1 Preferred repairs and consistent query answers 



Now, we introduce the general framework. We begin by defining a general 
notion of a family of preferred repairs. We do not make any assumptions 
on how such a family constructs preferred repairs. For generality, we do 
not even assume that the constructed instances are repairs in the sense of 
Definitional Instead, we list later on the desirable properties a well-behaved 
family should satisfy. 

Definition 7 (Preferred repairs) A family of preferred repairs is a func- 
tion XRep defined on triplets (/, -F, >), where > is a priority in / w.r.t. a 
set of denial constraints F, such that XRep{I,F,>) is a set of database 
instances over the same schema. We say that a family yRep subsumes a 
family XRep, denoted XRep E yRep, if XRep{I, F, >) c yRep{I, F, >) for 
every (/, F, >). 



We generalize the notion of consistent query answers ^ by considering 
only preferred repairs when evaluating a query (instead of all repairs). We 
can easily generalize our approach to open queries as in \12\ I14j. 

Definition 8 (<Y-preferred consistent query answer) Given a closed 
query Q, a triple (/, F, >), and a family of preferred repairs XRep, true 
(false) is the X -preferred consistent query answer to Q in I w.r.t. F and > 
if for every I' 6 XRep{I, F, >) we have I' \= Q {I' ^ Q respectively). 

Note that we obtain the original notion of consistent query answer if we 
consider the family of all repairs Rep{I, F). 

3.2 Desirable properties of preferred repairs 

Now, we identify desirable properties of arbitrary families of preferred re- 
pairs. The properties should be satisfied for an arbitrary instance I and an 
arbitrary set of denial constraints F. 

VI Non-emptiness 

Because the set of preferred repairs is used to define preferred consistent 
query answers, it is important that for any preference information the frame- 
work is not trivialized by an empty set of preferred repairs: 



XRep{I, F, >) ^ 0. 
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V2 Monotonicity 

The operation of extending the preference allows to improve the state of 
our knowledge of the real world. The better such knowledge is the finer 

the (preferred consistent) answers we should obtain. This is achieved if 
extending the preference can only narrow the set of preferred repairs: 

>i ^ >2 ^ XRep{I, F, >2) ^ XRep{I, F, >i). 
■P3 Non-discrimination 

Removing repairs from consideration must be justified by existing preference 
information. In particular, no repair should be removed if no preference is 
given: 

A:Rep{I,F,0) = Rep{I,F). 

■P4 Categoricity 

Ideally, a preference that cannot be further extended (the priority is total) 
should specify how to resolve every conflict: 

> is total =^ \XRepiI,F, >)| = 1. 
V5 Conservativeness 

We also note that properties V2 and VS imply that preferred repairs are a 
subset of all repairs: 

XRep{I, F, >) c Rep{I, F). 
3.3 Data complexity 

We also adapt the decision problems to include the priority. Note that the 
priority relation is of size quadratic in the size of the database instance, and 
therefore it is natural to make it a part of the input. For a family XRep of 
preferred repairs the decision problems we study are defined as follows: 

{i) X -preferred repair checking, i.e. the complexity of the following set 

= {{I,>,I'):I' sXRep{I,F,>)}. 

(ii) X -preferred consistent query answering, i.e. the complexity of the 
following set 

T^F^Q = {{I, >) ■■ VI' e XRep{I, F, >).!' ^ Q). 
14 



4 Globally optimal repairs 



We investigate several different families of preferred repairs. We start by 
investigating a family based on a notion of repair optimality inspired by work 
on preferred models of logic programs [25] and preferential reasoning [20] . 



Definition 9 (Globally optimal repairs QRep) Given an instance /, a 
set of denial constraints F, and a priority >, an instance I' <^ I is globally 
optimal w.r.t. > and F if no nonempty subset X of facts from /' can be 
replaced with a subset Y of /\/' such that 

Vx 6 X. 3y 6 F. y > X {^g) 

and the resulting set of facts is consistent with F. QRep is the family of 
globally optimal repairs, i.e. QRep{I, F, >) is the set of all repairs of / w.r.t. 
F that are globally optimal w.r.t. > and F. 

In the sequel, we fix an instance / and a set of denial constraints F, and 
omit them when referring to the elements of GRep{I,F, >). 

The notion of global optimality identifies repairs whose compliance with 
the priority cannot be further improved. For the instance Iq in Example [U 
with the priority in Figure [2] the set of globally optimal repairs consists of 
I2 and I^. 

Before investigating the properties of QRep we present an alternative 
characterization of globally optimal repairs. 

Proposition 3 For a given priority > and two repairs I[ and /g, /( is 
globally preferred over I2, denoted I[ »g I2, if 

yxEl'2\l[.3yel[\l'2.y>x. (^g) 

The following facts hold: 

(i) a repair I' is globally optimal if and only if it is ^g -maximal, i.e. there 
is no repair /" different from I' such that I" »g /'; 

(ii) if > is acyclic, then so is »g. 

Proof (i) We prove the contraposition, i.e. /' is not globally optimal if 
and only if there exists a repair /" ^ /' such that /" »g I'. 

For the i/part take X = and Y = I"\r, and note that ( |H<g| ) follows 

from ( jittg| ). Naturally, (/'\^) kjY = I" is consistent. For the only z/part 
take any nonempty X c J' and Y c /\/' such that (|H<g|) is satisfied and 
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J = (I'\X) u y is consistent. We take any repair /" that contains J. Such 
a repair exists since J is consistent. Clearly, /'\/" c X and also Y c . 
Hence follows from ( |H<g| ). Consequently, /' is not globally optimal. 

(ii) Suppose »g is cyclic, i.e. there exists a sequence of different repairs 
Jg, . . . , such that I- » for i 6 {0, . . . , n — 1}, where the + operator 
is interpreted modulo n. We show that > is cyclic as well. We construct 
inductively an infinite sequence of facts yi,y2, ■ ■ ■ and a sequence of numbers 
ki,k2, ■ ■ ■ such that yj+i > yj for j 6 N and yj ^ I'f,, and yj e I'k^+i for j e N. 

For j = 1 let yi be any element of Ii\Iq and fci = 1. Now, suppose 
we constructed the two sequences up to their j'-th elements yj and kj such 
that yj ^ I'j^, and yj & I'f,._^_i- If yj 6 /q' then yj must have been pushed out 
somewhere between Iq and Ij^,, i.e. there exists kj_^_l s {0, . . . , kj — 1} such 
that 6 I'f^^^^ and ^ I'k^^^+i- By /fc^.^^+i »g /^.^^ there exists an element 
yj+i e ^kj+i+i\^kj+i such that y^+i > yj. The case when yj ^ Iq is treated 
symmetrically, yj must have been pushed out somewhere between /^ ._|_^ and 
T' - T' 

Clearly, I has only a finite number of elements and thus any infinite 
>-chain must have a repetition, thus > is cyclic. □ 

Proposition 4 QRep satisfies the properties Vl-VA. 

Proof We get V\ by acyclicity of »g and Proposition [3l To show V2 we 
observe that if a repair is globally optimal w.r.t. >2, then it is globally op- 
timal w.r.t. any >i such that >i ^ >2- VS follows directly from definition: 
to show that a repair is not globally optimal, > needs to be nonempty. 

Showing V4: requires a more elaborate argument. Take a total >. By VI 
there exists at least one globally optimal repair. Suppose that there exist 
two different globally optimal repairs Jg and /{. In the remaining part of 
the proof for i ^ 2 we let I- = /' 2 • We will show that > is cyclic by 
creating an infinite chain ... > xi > xq such that Xi e /'\Z'_|_x for every 
z e N. For xq we take any element from /gVi- Now, assuming that the 
sequence have been defined up to the i-th. element Xj, we chose Xj+i to 
be any element of such that Xi > Xj+i. We show the existence of 

Xij^i using the global optimality of I'^j^i- First, we observe that the instance 
I'i+i u {xi] is inconsistent since Xi ^ and is a repair, i.e. maximal 
consistent subset of I. Let Ci, . . . , be the sequence of all conflicts present 
in u {xi}. Clearly, for every j e {l,...,k} the conflict Cj contains a 
fact Zj ^ I[ since Cj % I[ by the consistency of Let X = {zi, . . . , z^} 
and Y = {xi}. Naturally, u F is consistent, and thus by global 

optimality of there exists an element Xj+i 6 X such that Xi Xj+i. But 
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by totality of >, and the fact that every element of X is a neighbor of Xi, 
we have that rcj+i > Xj. Clearly, Xj+i ^ and moreover Xj+i 6 because 
Xj+i e Cj\{xi} ^ /-.^i for some j e {1, . . . , /c}. □ 

Now, we present Algorithm [2] that constructs globally optimal repairs. 
It begins with an arbitrary repair obtained with Algorithm [1] and then iter- 
atively attempts to improve the repair conformance with the priority. 

Algorithm 2 Constructing a globally optimal repair of I w.r.t. F 

1: construct a repair /' /* Algorithmic*/ 

T. while 3X c /'. 3^ c /\/'.Vx eX.^y.y>x do 
3: J ^ u Y 

4: extend J to a repair /" /* Algorithmic / 
T. V ^ 1" 

8: return V 



Naturally, Algorithm [2] is sound because its main loop stops only if the 
instance /' is globally optimal. It is also complete because it is based on 
Algorithm [1] which constructs any repair, in particular the globally optimal 
ones. We observe that if /• is the repair constructed in the z-th iteration of 
the main loop, then I-^^ »g Since »g is acyclic and the number of repairs 
bound by an exponential function of the size of /, the algorithm performs 
at most an exponential number of iterations. Checking global optimality 
(line 2) can be done in exponential time, and thus the algorithm works in 
exponential time. 

Theorem 1 Algorithm is a sound and complete algorithm constructing 
globally optimal repairs. It works in time exponential in the size of the input 
instance and the priority relation. 

The following example shows that the exponential bound on the number of 
iterations of Algorithm [2] is tight. 

Example 3 For a given n 6 N we construct an instance and a priority 
>n such that the size of I„ is 0{n), the size of >„ is O(n^), and there exists 
a »G-chain of length Q{2"'). 

Intuitively, we construct a chain of repairs which emulate a n-bit binary 
counter, incremented from to 2" — 1. Incrementing a counter consists of 
setting to 1 the least significant bit with value and setting to all the 
preceding bits (up to this point all set to 1). This last step can be seen as 
a (cascading) propagation of the carry bit. 
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We work with instances of one relation only R{A, B) and the constructed 
instances use the following facts: 

• = R[i, 0) representing the i-th. bit set to 0, for i e {0, . . . , n — 1} 

• pj = R{i, 1) representing the i-th. bit set to 1, for i e {0, . . . , n — 1}, 

• = R(i, 2) representing the i-th bit being carried over to the + l)-th 
bit, for z 6 {0, . . . , n — 2}. 

To ensure proper behavior of the counter we use the following three con- 
straints: 

R : B, 

yi,j.^[R{i,2) AR{j,l) Az>j], 

Vi,J.-[i?(z,l) Ai?(i,2) Aj=i-1]. 

The first constraint ensures that a bit is set to 0, set to 1, or being carried 
to the higher bit. The second constraint ensures that propagating a carry 
bit resets all lower bits to 0. The third constraint ensures that a bit can 
be carried over only if the immediately higher bit is set to 0. The correct 
direction of increment is ensured by the priority relation defined as: 

pj>nPi for z e {0, . . . ,n - 1}, 

Pi >n pI-1 for z 6 {1, . . . , n - 1}, 

Pi ^nPj for z e {1, . . . , n — 2} and j e {0, . . . , i}. 

The constructed chain of repairs corresponds to subsequent natural num- 
bers ranging from to 2" — 1. Additionally, for odd numbers the chain 
contains also repairs that represent the cascading propagation of the carry 
bit. Figure [3] contains an example of an instance Is and a sequence of repairs 
that constitute a »g-chain. For instance, /g and /( correspond to and 1 
respectively, while /( ^ corresponds to 1 being incremented with a carry bit. 
In general, for a given n e N the constructed instance is 

In = {P0,ph,P0:Pl,Pl,Pl, ■ ■ ■ ,Pn-l,Pi-l}- 

For every i e {0, . . . , 2" — 1} let {bQ,b\, . . . , be the binary representation 
of i, where 6* e {0, 1} and 6q denotes the least significant bit, i.e. X!j=o ^'^^i ~ 
i. The repair corresponding to i 6 {0, . . . , 2" — 1} is 

ii = {p},p},...,itn- 
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pI P2 - pI P2 ^ P\ P2 P\ 

X X X X 




pI ^pI P2^ pI pI ^pI pI^ pI 

X X X X 




Figure 3: The instance and the chain »g Jg »g ^ »g /g »g I'^ »g 

^3,c »0 -^3 »e -^2 »a ^l,c »0 I'l »6 I'o- 

For every odd i e {1, 3, . . . , 2" — 3} we also construct the repair that propa- 
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gates the carry bit in a cascading fashion 

^i,c = {PO) • • • ^Pji-2^Pji-l^Pji' 1 ■ ■ ■ iPn-l }' 

where ji is the position of the least significant bit of the binary representation 
of i that is set to 0, i.e. the minimal j such that 6*- = 0. It can be easily 
shown that 



■■■»g 4c ^3 »g»g 4 »g»g A,c »g I'l »g I'o- 

Finally, we observe that Algorithm [21 may traverse the full length of 
the constructed chain during its execution with and >„. We remark, 
however, that in this example the globally optimal repair /2"-i may be 
attained in one step from any of the constructed repairs, i.e. /2"-i ^1 
for i 6 {0, . . . , 2" - 2} and »g I^^ for i e {1, 3, . . . , 2" - 3}. □ 

Now, we investigate computational properties of globally optimal repairs. 
We observe that verifying whether a repair I' is not globally optimal can be 
easily accomplished with a nondeterministic Turing machine: it suffices to 
guess the sets X and Y , verify that {I'\X) u y is consistent, and check that 
(Wg) holds. Consequently, is in coNP. The membership of T^^q in 



follows from Definition [8l true is not the ^-preferred consistent answer to 
a query if the query is not true in some globally optimal repair. 

Proposition 5 Q -preferred repair checking is in coNP and Q -preferred con- 
sistent query answering is in Ilg. 

The upper bounds are tight. 

Theorem 2 There exists a set of 4 FDs and an atomic query for which 
Q -preferred repair checking is coNP-hard and G -preferred consistent query 
answering is T\^-hard. 

Proof We show Ilg-hardness of Pf-g by reducing the satisfaction of V* 3* QBE 
formulas to T^^q. Consider the following formula: 

where ^> is quantifier- free and is in 3CNF, i.e $ equals to ci a . . . a Cs, and 
are clauses of three literals Ik^i v 4,2 v /fc^s. We call the variables xi, . . . , x„ 
universal and x^+i, • • . ,Xn+m existential. We use the function q to identify 
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the type of a variable with a given index: q{i) = 1 ioi i ^ n and q{i) =0 
for i > n. We also use the following two auxiliary functions var and sgn on 
literals of 

var{xi) = var{^Xi) = i, sgn{xi) = 1, sgn{^Xi) = —1. 

A valuation is a (possibly partial) function assigning a Boolean value to the 
variables. 

We construct instances over the schema consisting of a single relation 

R{AuBi,A2,B2,A3,B3,A4,B4). 
The set of integrity constraints is 

F = {Ai^ Bi,A2 ^ B2,A3 ^ B3,A4 ^ B^}. 
The reduction uses the following types of facts: 

• Vi and Vi corresponding to the positive and negative valuations of Xi 
resp. (for i e {1, . . . ,n + m}) 

Vi = R{0,q{i),i,l,i,l,i,l), Vi = R{0,q{i),i, -1), 

• dk corresponding to Cfc (for A;6{l,...,s}) 

dk = R{0, 1, var{£k,i), sgn{ek,i),var{£k,2), sgn{ek,2),var{£k,3), sgn{£k,3)), 

• Pj and py used to partition the set of all repairs 

P3 = R{o, 0, 0, 0, 0, 0, 0, 0), = Rio, 1, 0, 0, 0, 0, 0, 0). 

For the ease of reference by L^ p we denote the fact corresponding to the 
satisfying valuation of literal £k^p, i.e.: 

_ \vi when = Xi, 
]^Vi when £k,p = -■Xj. 

The constructed instance is 

= {vi,Vi, ... ,Vn+m,Vn+m,dl, ... ,ds,py,p3}, 
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and >iii is the minimal priority of /iji w.r.t. F such that: 




if Cfc uses a positive hteral Xj, 
if Cfc uses a negative hteral -■Xj 
for alH e {1, . . . , n}, 
for all i e {1, . . . , n}, 



P3 Vi 



P3 >* 



P3 >* 



Figure U] contains a prioritized conflict graph of the instance and the priority 
obtained for the formula: 

^' = Vxi, X2, X3.3a;4, X5.(-'Xi V X4 V 2:2) a {^X2 V -'X5 V -'X3). 



Figure 4: The prioritized conflict graph for ^ = ^xi,X2,X3. 3x4, X5. v 
X4 V X2) A (-'X2 V -'X5 V -'X3). Dotted lines used to show the conflicts w.r.t. 

The query used in the reduction is Q = and we claim that ^ is valid 
if and only if true is ty-preferred consistent query answer to in w.r.t. 
F and The proof is technically elaborate but can be summarized as 
follows. First, we partition the set of repairs into 3- and V-repairs that 
correspond to valuations of existential and universal variables. Next, we 
show that a 3-repair is globally preferred over a V-repair if the combined 
valuation satisfies <I>. Consequently, we argue that if the 3-repairs are the 
only globally optimal repairs, then for every valuation of universal variables 
there exists a valuation of existential variables that together satisfy i.e. 
^ is valid. 
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We partition the set of all repairs of into two disjoint classes: ^-repairs 
that contain and M -repairs that do not contain p^. We note that because 
of the FD Ai Bi every V-repair contains pv For the same reason, an 
V-repair is a subset of {vi^vi, . . . , di, . . . , druPv} and an 3-repair is a 

subset of {Vn+l.Vn+l, . . . ,Vn+m,Vn+m,P^}- 

We use 3- and V-rcpairs to represent all possible valuation of existential 
and universal variables respectively. To easily move from a partial valuation 
of variables to a repair we define the following two operators: 

h[V] = {v^ I V{xi) = true a q{i) = 0} u {vi \ V{xi) = false a q{i) = 0} u {pa}, 

/v[^] = {vi I V{xi) = true a q{i) = 1} u {vi \ V{xi) = false a q{i) = 1} u {py} u 

J , if for every literal £k,i of Cfc, for which V is defined | 
1 on the variable used in £kA, we have V y= ik,i j ' 

To move in the opposite direction, from a repair to a (possibly partial) 
valuation we use: 

{true if Vi e I', 

false if Vi e I', 

undefined otherwise. 

We observe that V[-] defines a one-to-one correspondence between 3-repairs 
and total valuations of existential variables. A similar statement, however, 
does not hold for V-repairs because of the interaction between facts dk and 
the facts corresponding to universal variables. Consequently, for some V- 
repair /' the function V[I'] may be only a partial valuation of universal 
variables. We call a V-repair /' strict if V[I'] is a total valuation of universal 
variables. In this way, V[-] defines a one-to-one correspondence between 
strict V-repairs and total valuations of the universal variables. The following 
result allows us to remove non-strict V-repairs from consideration. 

Lemma 1 Strict \f -repairs are exactly '»g -maximal \f -repairs. 

Proof First, we prove that no non-strict V-repair is »g-maximal. For that 
we show how, for any non-strict V-repair /', construct a strict V-repair /" 
such that I" »5 /'. Take the partial valuation V' = V[I'] and extend it 
to a total valuation V" of universal variables by assigning false value to 
variables undefined by V', i.e. 

V" = y' u {(xj, false) | 1 ^ z ^ n a V'{xi) is undefined}. 

We take /" = Iy[V"] and show that 

yq' el'\l".3q" Sl"\l'.q" >q'. 
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There are 4 cases of values of to consider: 

1. q' = P3-, q' = Vi, or q' = Vi for i € {n + 1, . . . , n + m} is not possible 
because neither of /' and /" contains these facts (being V-repairs) 

2. q' = pv is not possible because both /' and /" are V-repairs. 

3. q' = Vi or q' = Vi for some i e {1, . . . ,n} is also impossible because 
from the construction of /" we know that 

I" n {vi,Vi, . . . ,Vn,Vn} ^ I' n {vi,Vi, . . .,Vn,Vn}. 

4. q' = dk for some k € {1, . . . , s}. The neighborhood of in the conflict 
graph consists of facts pj, L^ i, 2, and 3. We observe that none 
of these facts belongs to /'. However, one of the facts must belong 
to /" because q' ^ I" and since /" is a maximal consistent subset 
of /(!>. Since /" is an V-repair, does not belong to /". Therefore, 
for some p e {1,2,3} the fact L^ p must belong to I". Consequently, 
q" = Lk,p >$ q'. 

Now, we show that every strict V-repair is also »g-maximal among V- 
repairs. Suppose otherwise, i.e. for some strict V-repair /' there exists an 
V-repair /" such that /' »g /". Since I' is strict it contains Vi or Vi for every 
i e {1, . . . ,n}. By the construction of the priority >$ the repairs /' and /" 
must agree on facts vi,vi, ... ,Vn,Vn- Therefore /' = -^v[^[-^"]] using 
the reasoning from the previous part we can show that /" »g /'. Since >,j, 
is acyclic, by Proposition [3] this gives us I' = I". □ 

The central result in our reduction follows. 

Lemma 2 For any total valuation V , I^lV] »g -^v[^] if 0'''^d only ifV \= ^■ 

Proof For the i/part, because a V-repair is disjoint with any 3-repair, it 
is enough to show that for any fact q' 6 /v[^] there exists a fact q" € IjIV] 
such that q" > q'. For py, vi,vi, . . . , Vn, Vn we simply choose p^. If dk belongs 
to -^v[^]) we note that none of the neighbors of dt belongs to /v[^]- This 
implies that none of the literals using a universal variable is satisfied by V . 
Hence there must exist a literal £fc,p of the clause Ck^p that uses an existential 
variables and that is satisfied by V . Consequently, we have L^p 6 I^\V\ and 

For the only if part take any k e {1, . . . , s} and consider the conjunct 
Cfc = ^k,i V lk,2 V £fc,3. If none of the literals, which use universal variables. 
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is satisfied by V, then none of the corresponding L^^p belongs to /v[^]) 
and consequently, is in /v[^]- Then /^[F] must contain a fact Lf^^ 
corresponding to one of the literals of using an existential variable. This 
implies that V \= £k,p', and consequently, V \= Ck- ° 

This gives us. 

Fact 3 The QBF * is true if and only if for any strict M -repair I' there 
exists a ^-repair I" such that I" y>g I'. 

Because only a 3-repair can dominate a strict V-repair and every non-strict 
V-repair is dominated by a strict one, we can make a more general statement. 

Fact 4 The QBF ^ is true if and only if for any \f -repair I' there exists a 
repair I" such that I" »g I'. 

V-repairs are defined as repairs that do not contain the fact p^ and thus: 

\=\/xi,...,Xn.^Xn+l,...,Xn+m-^ iff 

V/' e Rep{I^,F). [I' \= -pg] ^ [37" e Rep{I^,F).I" »g I'] iff 
V/' e Rep{I^,F). e Rep{Iq,, F). I" »g I'] =^ [I' \= p^] iff 
V/' e GRepilis, ,F,>is,). I' \= p^ iff 

We finish by observing that the reduction can be carried out in polynomial 
time. 

To show coNP-hardncss of Bp we remark that a 3CNF formula ^ can 
be treated as a V*3*QBF with no universal variables. This way, wc use the 
previous transformation to reduce the complement of 3SAT to Bp] If /$ is 
the instance obtained from the transformation of then {pg} is a globally 
optimal repair of 7$ if and only if $ ^ SSAT. □ 

5 Pareto optimal repairs 

Another family of preferred repairs uses a notion of optimality that requires 
a stronger support from the priority to remove a repair from consideration. 

Definition 10 (Pareto optimal repairs VFtep) Given an instance 7, a 
set of integrity constraints F, and a priority >, an instance 7' ^ 7 is Pareto 
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optimal w.r.t. > and F if no nonempty subset X of facts from /' can be 
replaced with a nonempty set Y of facts from I\I' such that 

€ X.\fy eY.y > X {^-p) 

and the resulting set of facts is consistent with F. VRep is the family of 
Pareto optimal repairs, i.e. 'PRep{I , F, >) is the set of all repairs of I w.r.t. 
F that are Pareto optimal w.r.t. > and F. 

In the sequel, we fix an instance / and a set of denial constraints F, and 
omit them when referring to the elements of VRep{I, F, >). 

We remark that Pareto optimality is weaker than global optimality. 

Proposition 6 VRep satisfies Vl-VA. Also, QRep E VRep. 

Proof QRep E VRep follows from Definitions [9] and [TOl The arguments 
used to prove VI through VA are essentially the same as in Proposition HI 



To show that VRep % QRep we recall the instance Ii from Example [2] whose 
prioritized conflict graph is in Figure O The repairs /( and I2 are Pareto 
optimal but only 7{ is globally optimal. 



Mgr(Bob,$70k,RkD) 




Mgr{Mary,$50k,PR) 




Mgr{Ken, $60k, IT) 














Mgr{Bob,$60k,AD) 




Mgr{Mary,UOk,IT) 




Mgr{Ken,$50k,PR) 



Figure 5: Prioritized conflict graph from Example [2j 

Similarly to QRep, "P-preferred repairs have an alternative characteri- 
zation that is based on extending the priority to a pre-order on repairs. 

Proposition 7 For a given priority > and two repairs ![ and I2, ![ is 
Pareto preferred over I2, denoted I[ »-p I2 if 

3ysI[\I'2.^xeI^\I[.y>x. (^p) 

The following facts hold: 
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(i) a repair I' is Pareto optimal if and only if it is '^■p- maximal, i.e. there 
is no repair I" different from I' such that I" »p /'; 

(ii) if > is acyclic, then so is »-p. 

Proof (i) We prove the contraposition, i.e. /' is not Pareto optimal if and 
only if there exists a repair /" ^ /' such that /" »-p /'. 

For the if part take X = I'\I" and Y = {y}, where y 6 I"\I' such 
that Vx 6 X we have y > x {it exists by /" »-p I'). Clearly, X and Y 
validate ( j^-p| ) and I'\X uY is consistent (as a subset of /"). Consequently, 
/' is not Pareto optimal. For the only if part take any nonempty X <^ I' 
and Y c /\/' such that ( j^-p| ) holds and J = {I'\X) u y is consistent. Take 
any repair /" that contains all facts of J. Clearly, /'\/" = X and Y c 
so it suffices to take X and any y e 1" to verify ( ji|t-p| ). 

(ii) We observe that I' »-p /" imphes /' »g /". Thus, if »p has cycles, 
then so does »g, and consequently, >. □ 

The class of Pareto optimal repairs is the largest class of preferred repairs 
we consider in this paper. Therefore, the following result allows us to identify 
the computational implications of introducing preferences to the framework 
of consistent query answers. 

Theorem 5 For any family XRep of Pareto optimal repairs satisfying Vl 
and V2 deciding X -consistent query answering is coNP-hard. 

Proof We show the hardness by reducing the complement of SAT to T^pq- 
Take then any CNF formula $ — ci A . . . A Cfc over variables x\, ■ ■ ■ , Xn and 
1 V ... V £j j^ .. We assume that there are no repetitions of literals 
in a clause (i.e., Ij^^i ^ ^jm)- construct a relation instance /$ over the 
schema R(Ai, Bi, A2, B2) in the presence of two functional dependencies 
F = {Ai Bi,A2 B2}. The instance /$ consists of the following facts: 

• Wi = R{i,l,i,l) corresponding to the positive valuation of Xi (for 
ie {l,...,n}), 

• Wi = R{i, —1, —i, 1) corresponding to the negative valuation of Xi (for 
every i e {1, . . . , n}), 

• dj = R{n + J, 1,0, 1) corresponding to the clause cj (for every j e 
{l,...,m}), 

• vj = R{n + j, 1, —i, 0) encoding the use of Xi in the clause Cj (for any 
i 6 {1, . . . , n} and j 6 {1, . . . , m} such that cj uses Xi), 
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• vj = R{n + j, 1, i, 0) encoding the use of -■rcj in the clause Cj (for any 
i e {1, . . . , n} and j 6 {1, . . . , m} such that Cj uses -■Xj), 

• 6 = i?(0, 0, 0, 0) corresponding to the formula 

The constructed priority >$ is the minimal priority of /$ w.r.t. F such 
that: 

m >$ vj, vj >$ dj, >$ 6, 

Figure [6] presents prioritized conflict graph obtained from the formula $ = 

(-■Xl V X2 V X3) A (-'X3 V -'X4 V X5). 




Figure 6: The prioritized conflict graph for ^ = {^xi v X2 v x^) a (-'X3 v 

— 'X4 V X5). 

The query we consider is Q = ^b. We claim that 

(/$, >$) 6 P^^Q ^ V/' 6 XRep{U,F, >^).b^l' ^ $ ^ 5^r. 

For the i/ part, suppose there exists a repair I' e XRep{I^, F,>^) such 
that b 6 I'. Obviously, for every j e {1, . . . , m} the fact dj does not belong 
to I' . Also, for every j at least one fact neighboring to dj, other than b, is 
present in /', or otherwise /' is not a Pareto optimal repair. Similarly, /' has 
either Wi or Wi for every i € {1, . . . , n}, and hence, the following valuation is 
properly defined: 

1 false if Wi e I' . 
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We claim that 1/ |= Suppose otherwise and take any clause cj unsatisfied 
by V. Let x ^ b he the fact neighboring to dj that is present in /'. W.l.o.g 
we can assume that x = vj^iq for some zq and then literal of c,-. 

Also then, does not belong to /' and so V{xig) = false. This implies 
that V \= -'Xjp and V |= cj; a contradiction. 

For the only i/part, suppose there exists a valuation V such that y |= <I> 
and consider the following instance 

/' = {wi I V{xi) = true} u {iBi \ V{xi) = false} u 

{vj I V{xi) = true} u {vj \ V{xi) = false} u {b}. 

First, we note that I' is a repair and a Pareto optimal one. Next, we show 
that /' 6 XRep{I^, F,>^). To prove this consider the following priority 
>' = >$ u {{vi,Vi) I V{xi) = true} u {{vi,Vi) \ V{xi) = false}. It can be 
easily verified that /' is the only Pareto optimal repair of /$ w.r.t. F and 
>'. Since XRep satisfies VI, we get I' e XRep{Ii^,F, >'). Note that >' is 
an extension of >$ and thus I' belongs to XRep{I^, F, >) by V2. Finally, 
we observe that h e I' which implies that true is not ^-preferred consistent 
query answer to Q in /$ w.r.t. F and a contradiction. 

We finish the proof with the observation that the described reduction 
requires time polynomial in the size of the formula n 

Lemma 3 A repair I' is not Pareto optimal w.r.t. > if and only if there 
exists a fact y e /\/' such that for every conflict C in I' vj {y] there is x e C 
such that y > x. 

Proof For the i/part, let Ci , . . . , be all confiicts in /' u {y} and Xi be the 
element of Cj such that y > xi (for i 6 {1, . . . , k]). Clearly, . . . , Xfc})u 

{y} is consistent, which shows that I' is not Pareto optimal. 

For the only i/part, take any nonempty X and Y such that {I'\X) u Y 
is consistent and \/y eY.\lx e X.y > x. Fix any y eY and take any conflict 
C in /' u {y}. Clearly, C contains an element x of X since {I'\X) u y is 
consistent. Naturally y > x. n 

Corollary 1 V -preferred repair checking is in LOGSPACE and V -preferred 
consistent query answering is coNP- complete. 

Proof We observe that to check the condition of Lemma [3] we need to 
iterate over /\/' which can be accomplished with two pointers: one to iterate 
over I and the other to scan I'. Recall that a conflict is a set of facts and its 
cardinality is bounded by the size of F which is assumed to be a constant 
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parameter. Hence, we can iterate over all conflicts of /' (extended with one 
fact) using a constant number of pointers scanning /'. Consequently, V- 
preferred repair checking is in LOGSPACE. 'D^q belongs to coNP from the 



Now, we investigate a sound and complete algorithm for computing 
Pareto optimal repairs. First, we observe that is possible to use an algorithm 
similar to Algorithm [21 starting with an arbitrary repair and attempting to 
iteratively improve its conformance with the priority until a Pareto optimal 
repair is reached. While checking Pareto optimality can be done in polyno- 
mial time, we note that the sequence of repairs, constructed in Example [3l 
of exponential length is also a »-p-chain. Consequently, such an algorithm 
may require an exponential number of iterations to obtain a Pareto optimal 
repair. 

We propose a simpler approach where we construct an arbitrary repair 
and if it is not Pareto optimal we discard it and construct a common optimal 
repair using Algorithm U presented in the next section. Common optimal 
repairs constitute a subset of Pareto optimal repairs and thus Algorithm [3] 
is sound. Naturally, it is also complete because it is based on Algorithm [1] 
which constructs all repairs. Finally, it works in polynomial time since 
checking pareto optimality is in LOGSPACE and Algorithms [1] and [H work 
in polynomial time. 

Algorithm 3 Constructing a Pareto optimal repair of / w.r.t. F and >. 
1: construct a repair /' of / /*Algorithm\^*/ 
2: if /' is Pareto optimal w.r.t. > then 
3: return /' 



Proposition 8 Algorithm\^is a sound and complete algorithm constructing 
Pareto optimal repairs. It works in time polynomial in the size of the input 
instance and the priority relation. 

6 Common optimal repairs 

The last family of preferred repairs is based on a notion of optimality dif- 
ferent from global and Pareto optimality. When repairing a database with 




4: else 



5: 



return any common optimal repair of / w.r.t. > 
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a priority that is not total and we resolve a conflict that is not prioritized, 
then we commit to a particular prioritization of this conflict. Constructing 
a repair that conforms to a given priority can be seen as constructing a total 
extension of that priority such that the constructed repair is the only repair 
globally optimal w.r.t. the total priority. We remark that this notion is 
quite robust as it remains identical if we replace in it global optimality by 
Pareto optimality. The same holds for all result stated in this section. This 
is because QRep and VRep coincide for total priorities by QRep E VRep 
and V4 for VRep and QRep. 

Definition 11 (Common optimal repairs CRep) Given an instance I, 
a set of denial constraints F, and a priority >, an instance /' E 7 is common 
optimal w.r.t. > and F if and only if there exists a total priority >' ^ > such 
that /' is globally optimal w.r.t. >' and F. CRep is the family of common 
optimal repairs, i.e. CRep{I, F, >) is the set of all repairs of / w.r.t. F that 
are common optimal w.r.t. > and F. 

Again, we fix an instance I and a set of denial constraints F, and omit them 
when referring to the elements of CRep{I, F, >). 

It is an open question whether there exists an intuitive definition of a 
pre-order on repairs whose maximal element are exactly common optimal 
repairs. We show, however, the family of common optimal repairs is the 
smallest family of globally optimal repairs that satisfies the properties VI 



Lemma 4 A repair I' is common optimal w.r.t. F and > if and only if 
I' 6 A!Rep{I,F, >) for every family XRep of globally optimal repairs that 
satisfies VI and V2. 

Proof For the only if part, observe that by Vi for QRep I' is the only 
globally optimal repair w.r.t. >. Consequently, /' 6 XRep(I,F, >') for any 
family XRep satisfying VI. Moreover, I' e XRep{I, F,>) because XRep 
satisfies V2 and > E >'. Thus, I' is a common optimal repair of I w.r.t. F 
and >. 

For the i/part, suppose > has no acyclic total extension >' for which 
/' is globally optimal w.r.t. >' . Consider the following family of globally 
optimal repairs 



and V2. 




gRep{r, F°, >°)\{r} if >° I° = I, and F" = F, 
QRepir, F°, >") otherwise. 
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It can be easily seen that XRep satisfies VI and V2. We observe that 
/' ^ XRep{I,F, >). Consequently, /' is not a common optimal repair of / 
w.r.t. F and >. □ 



Proposition 9 CRep satisfies Vl-VA and CRep E QRep. 

Proof CRep E QRep because QRep is a family of globally optimal repairs 
that satisfies both VI and V2 (cf. Proposition [5]). 

VI follows from the definition of common optimal repairs and the obser- 
vation that any priority > can be extended to some total >' (otherwise > 
would be a cyclic relation). Therefore QRep{I, F, >') c CRep{I,F, >) 
by V4: for QRep. V2 follows directly from Lemma [H 

To show ■pS we take an arbitrary repair I' and construct a priority > 
such that /' is globally optimal w.r.t. >. For that we take any total ordering 
>i of /' and any total ordering of >2 of /\/'. We obtain > with a careful 
composition of >i with >2'- 



R(t) > R'(t') 



R{t) >i R'{t') if R{t), R'{t') 6 I', 

true if R{t) e I' and R'{t') e /\/', 

R{t) >2 R'{t') if R{t), R'{t') 6 I\r, 

false if R{t) e /\/' and R'{t') e I', 



for any two neighboring facts R{t) and R'{t') {R{t) )f R'(t') if R(t) and 
R'(t') are not neighboring). Clearly, > is acyclic since it is based on the 
acyclic components >i and >2, and we add an element {R{t), R' (t')) only if 
R{t) 6 I' and R'{t') ^ I'. Naturally, > is a total priority. It is also easy to 
verify that /' is globally optimal w.r.t. >. V4: follows from CRep E QRep, 
V4: for QRep, and VI for CRep proved above. □ 

Common optimal repairs can be also characterized as exactly those repairs 
that can be obtained with an iterative selection of non-dominated facts, i.e. 
facts defined with the winnow operator [10]: 

w>(/) = {R{t) e / I $R'{t') 6 /. R'{t') > R{t)}. 



Theorem 6 Algorithm ^is a sound and complete algorithm constructing 
common optimal repairs. It works in time polynomial in the size of the 
input instance and the priority relation. 
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Algorithm 4 Constructing a common optimal repair. 



i: r ^ I 

2: J ^ 

3: while to>{I°) ^ do 

4: choose e 

5: r^F\{R{t)} 

6: if J u {Rit)} 1= F then 

7: J^Ju{i?(t)} 

8: return J 



Proof We observe that the instance resulting from an execution of Algo- 
rithm H] can be associated with the sequence of choices made in line 4 during 
the execution. We also observe that this sequence is an ordering of the facts 
of the original instance I. 

To show soundness, we take an instance /' obtained with the sequence 
of choices xi, . . . ,Xn- We show that I' is common optimal by extending > 
to a total priority >' for which I' is globally optimal. The priority >' is 
defined as 

Xi >' Xj Xi and Xj are neighboring and i < j. 

Clearly, >' is acyclic and a total priority. We also observe that > ^ >' 
because Xi > Xj implies that i < j, i.e. Xi is selected before Xj (choices are 
constrained by a;>). 

To show that I' is globally optimal w.r.t. >' take any Ac/' and 
any Y c /\/' such that (/'\A) u y is consistent. Now, take any Xj € Y 
and observe that adding Xj to the instance being created by Algorithm H] 
must have been prevented by some confiict {xj^, . . . , Xi,^, xj} with the facts 
added previously, i.e. ig < j for £ e {1, . . . , k}. Consequently, Xi^ >' Xj for 
£ e {1, . . . , A;}. We observe that at least one of , . . . , Xj^. must be present in 
X since Y contains xj, I' contains Xj^, . . . , Xi^ , and (/'\A) u y is consistent. 
Thus, Xj )f xi for some xi € X, I' is globally optimal w.r.t. >', and by V2 
for QRep we get that I' is globally optimal w.r.t. >. 

To show completeness, we take a common optimal repair /' and the total 
priority >' for which I' is globally optimal and use >' to construct a valid 
sequence of choices yielding /'. Naturally, the same choice sequence will be 
valid for an execution with > since >' extends >. 

Take an execution of Algorithm U] on I' with >' that constructs some 
instance /" with the sequence of choices xi, . . . , x„. Note that if Xj and Xj 
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are neighboring, then Xi >' xj if and only if i < j. Suppose that /" ^ I' and 
take the minimal index i of the element Xi on which I' and /" differ. Note 
that, /' n {xi, . . . , Xj_i} = /" n {xi, . . . , and either Xi e I' and Xi ^ I", 
or Xi ^ I' and 6 /". The first case is not possible because Algorithm |4] 
would have discarded Xi only if there had been a conflict involving Xi and 
some facts of /" n {xi, . . . Then, however, the same conflict would 

have been included in I', i.e. /' would have not been consistent. Suppose 
then, Xi ^ I' and Xi s I". Let Ci, . . . , be all conflicts present in I' u {xi} 
w.r.t. F. Since /' u {xi} is not consistent, there is at least one conflict in 
/' u {xi}. Naturally, I' n {xi, . . . u {xi} is consistent, and thus for 

every j e {1, . . . ,k} the conflict Cj contains a fact Xj. such that ij > i. Let 
X = {xii , . . . , Xjj. } and Y = {xj}, and observe that {I'\X) u F is consistent. 
Moreover, X and Y satisfy (Definition [9|) since ij > i implies that 

Xi > Xi- . Consequently, /' is not globally optimal; a contradiction. n 

Corollary 2 C -preferred repair checking is in PTIME and C -preferred con- 
sistent query answering is coNP- complete. 

Proof To check if a repair /' is common optimal we use Algorithm |4] to 
simulate the construction of /' by restricting the choice in line 4 to facts 
uj^{J) n /'. The repair /' is common optimal if and only if such a simulation 
can be performed successfully (i.e. it produces /'). Naturally, 'DpQ belongs 
to coNP and its coNP-completeness follows from Theorem [5j □ 

The exact complexity of C-preferred repair checking, whether its PTIME- 
complete or in LOGSPACE, remains an open question. 

The introduced families of preferred repairs create a hierarchy: 

CRep E QRep E VRep. 

Recall from the previous section that VRep ^ QRep (cf. Figure [5]). The 
following example shows also that CRep ^ QRep. Thus, the hierarchy is 
proper. 

Example 4 Consider the schema of one relation name R{A, B, C, D) with 
a set of two functional dependencies F = {R : A ^ B, R : C ^ D}. Take 
the following instance 

/ = {i?(l,l,l,l),i?(l,2, l,2),ii(l,3,0,0),i?(0,0, 1,3)} 

and the following priority relation 

> = {(i?(l,l,l,l),i?(l,3,0,0)),(i?(l,2,l,2),ii(0,0,l,3))}. 
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1,1,1) 




i?(l,2,l,2) 










i?(l,3,0,0) 




i?(0, 0,1,1) 



Figure 7: The prioritized conflict graph G{I,F, >). 

The corresponding prioritized conflict graph is in Figure [71 The instance / 
has 3 repairs: 

I{ = {i?(l, 1,1,1)}, = {i?(l,2,l,2)}, = {ii(l,3,0,0),i?(0,0,l,3)}. 
All repairs are globally optimal, but only /{ and I2 are common optimal. □ 
We observe, however, that under certain conditions this hierarchy collapses. 

Proposition 10 VRep, QRep, and CRep coincide under one of the follow- 
ing conditions: 

(i) the set of constraints F consists of one key dependency only; 
(a) the priority > can be extended to acyclic priorities only. 
Moreover, QRep and CRep coincide if 

(Hi) the set of constraints F consists of one functional dependency only. 

Proof For (i) to show that VRep E CRep in the presence of exactly one 
key dependency, we use the fact the conflict graph is a union of pairwise 
disjoint cliques and every repair consists of one element selected from each 
clique. 

We fix an instance /, a key dependency F, and a priority >. Let 
Ci,...,Cn be the cliques of G{I, F). Take any /' e VRep{I, F, >) and let 
Ri{ti), . . . , Rn{tn) be the elements of I' such that Ri{ti) e Cj. We note that 
since I' is Pareto optimal, then for every i there is no y 6 Ci\{R{ti)) such that 
y > Ri{ti), and consequently, Ri{ti) e a;>(Ci). Hence, i?i(ii), . . . ,Rn{tn) is 
a proper choice sequence for Algorithm [H Finally, we observe that if the 
fact Ri{ti) has been added to the constructed repair, then none of the facts 
of Ci\{Ri(ti)} can be further added. 
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For (ii) We take any /' 6 'PRep{I,F, >) and construct a total extension 
>' of > by prioritizing conflicts unprioritized by > in favor of /', i.e. >' 
is any total priority such that for any x s I' and any y conflicting with 
rr if y ){- X then x >' y. Since > can be extended to acyclic orientations 
only, >' is acyclic. Clearly, I' is a Pareto optimal repair w.r.t. >' and a 
unique one by for VRep. Therefore /' 6 CRep{I, F, >') and by 7^2 we 
get r eCRep{I,F,>). 

For (in) we assume a single relation name R with the functional de- 
pendency X ^ Y and use the notions of X-cluster and {X, y)-cluster 
(Section 12. 3|, page [TO]) . Let the instance / be the union of the X-clusters 
Ci, . . . , Cn- Take any globally optimal repair I' and let it be the union of 
the {X, y)-clusters Di, . . . , Dn [Dt ^ Ci for every i 6 {1, . . . , n}). By global 
optimality of I' we have that for every i € {1, . . . ,n} 

^Ri{ti)sD,.yysCi\Di.yir Ri{ti). 

Therefore, Algorithm S] can perform the first n iterations with a choice se- 
quence beginning with Ri{ti), . . . ,Rn{tn)- Because n{Ri{ti)) = Ci\Di and 
elements of Di conflict only with elements of Ci\Di, the remaining choices 
can consist of any ordering of {Di\{Ri{ti)}) u . . . u {Dn\{Ri{tn)})- Hence, 
/' is a result of Algorithm HI n 

We note that the conditions are sufficient but not necessary. 

7 Tractable case 

The intractability proofs use at least 2 FDs. Next, we investigate the case 
when only one FD is present. 

Theorem 7 // the set of integrity constraints contains at most one func- 
tional dependency per relation name and no other constraints, then comput- 
ing preferred consistent answers to quantifier-free queries is in PTIME for 
VRep, QRep, and CRep. 

Proof First, we observe that if only functional dependencies are consid- 
ered, facts can create confiicts only with facts of the same relation and 
therefore we can limit our consideration to schema consisting on one rela- 
tion name only. Consequently, we assume a single relation name R with the 
FD X ^ Y and use the notions X-cluster and {X, y)-cluster (Section 12. 3^ 
page [To]) . 
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Now, we fix an instance / and a priority >. For every fact R{t) 6 I, 
by (^/{(t) we denote the X-cluster to wfiicli the fact R{t) belongs to and by 
^R{t) we denote its (X, y)-cluster. 

We adopt the algorithm from |12] . We assume that the query $ is 
in CNF, i.e. $ = $i a . . . a By definition true is not a preferred 
consistent query answer to ^ if and only if there exists a preferred repair /' 
and i e {1, . . . ,n} such that /' ^ <I>j. For every i e {1, . . . , n}, the algorithm 
attempts to verify whether a preferred repair satisfying — '<I>j exists. If this 
condition is satisfied for some i e {!,..., n}, then true is not the preferred 
consistent answer to 

Now, fix i and consider 

= R{ti) A ... A R{tk) A ^R{tk+l) A ... A ^R{t^). 

To find if a preferred repair satisfying -^^i exists we use one of the two 
following polynomial tests depending on the family of preferred repairs we 
use. For simplicity, we assume that the facts R(ti), . . . ,R(tk) and the facts 

. . . , R{tn) belong to I; otherwise there is no repair satisfying 
or we can remove the negative literal from -^^j respectively (because repairs 
are subsets of I) . Recall that globally optimal and common optimal repairs 
coincide in the presence of one FD. n 

Lemma 5 A globally optimal (common) repair I' satisfying — '<I>j exists if 
and only if the following conditions are satisfied: 

(i) {R{ti), . . . , is conflict-free; 

(Hi) n u;>(C/j(i^.)) ^ for every j € {l,...,k}. 

(iv) uj^{Cii^tj))\{DR(tk+i) ^ ■ ■ -^DjK^t,,)) ^ for every j e {k + 1, . . . ,m}. 

Proof For the only i/part, we take any globally optimal repair /' satisfying 
— (i) and (ii) are trivially satisfied. 

Assume that /' is the result of Algorithm [H with the choice sequence 
i?(si), . . . , R{s£). Take any j e {1, . . . ,k} and let j' be the smallest index 
of a fact from Cj^^^.-j in the sequence. Clearly, R(sji) 6 /'. Since R{tj) also 
belongs to I', both R{sji) and R{tj) belong to the same (X, y)-cluster, i.e. 
R{sji) 6 DjK^i.y Also prior to selecting R{sj/) the temporary instance I" 
contains Cj^^t^y Therefore R{sj') 6 cj> (C/j(t^ ) ) which proves (Hi). 
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We show (iv) similarly. For any j G {k + 1, . . . ,m} let j' be the small- 
est index of a fact from Cj^^^j) the sequence of choices used to con- 
struct /'. Prior to making the choice R{sji) the temporary instance I" 
contains C^.^, R{sji) e ijj-^{Cjiitj))i ^-iid R(sj') does not belong to any of 

For the i/part, we construct /' using Algorithm H] with a choice sequence 
R{si), . . . , R{si) defined as follows. By (i) and (in), for j € {1, . . . ,k} the 
choice R{sj) is any fact from ^_R(t^) IjJ-^{Cji(i.^). By (ii) and (iv), for any 
j e {fc + 1, . . . ,m} the choice R{sj) is any fact from a;>(Cij(t^))\(£'H(ifc+i) ^ 
. . . u Dji^j.^^). The remaining choices R{tj) for j e {m + 1, . . . ,1] are selected 
in an arbitrary way. We observe that the first k steps guarantees that the 
facts R{ti), . . . , R{tk) belong to the repair instance /' (possibly placed there 
in later consecutive steps) and that I' does not contain any of the facts 
R{tk+i)i ■ ■ ■ , R{t 

Lemma 6 A Pareto optimal repair I' satisfying — '<I>j exists if and only if 
the following conditions are satisfied: 

(i) {R{ti), . . . , R{tk)} is conflict-free; 

(ii) {DR(ti), Dn(t,)} n • • • ' ^fl(tm)} = ^' 

(Hi) for every j s {I, . . . ,k}, for every fact R{t) e CR{t,)\DR{t,) there exists 
R{t') 6 such that R{t) )f R{t'). 

(iv) for every je{/c + l,...,m} there exists an {X, Y)- cluster D of Cj^^i.-^ 
different from D^f^tk+i), DR(tm) such that for every t e D^f^tk+i) ^ 
. . . u D/jj-f^), there exists R{t') e D such that R{t) R{t'). 

Proof For the only if part, we construct the repair /' by selecting an 
{X, y)-cluster from every X-cluster. Because Pareto optimality is defined in 
terms of neighboring facts and for one FD conflicts can be present only in- 
side an X-cluster, to show that the repair /' is Pareto optimal it is enough to 
show that for every X-cluster the selected (X, y)-cluster is Pareto optimal 
(among all (X, y)-clusters in the X-cluster). 

For X-clusters C^(ti), . . . , C'ij(t^) we select -DR(ti), . . . , Df^^t^^ resp. We 
note that by (i) the (X, y)-clusters belong to different X-clusters and by 
(ii) we do not include any of the facts R{tk+i ), • • • , R{tm)- Pareto optimality 
is implied by (Hi). For X-clusters Cr(i^^_^^, . . . , Cr(i^^ we select the {X, Y)- 
clusters as described in (iv). Pareto optimality of those clusters is also 
implied by (iv). For an X-cluster other than Ci,...,Cm we select any 
(X, y)-cluster that is a Pareto optimal (for the X-cluster). Since all selected 
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[X, y)-clusters are Pareto optimal, the instance /' is a Pareto optimal repair 
such that /' 1= -.cl)j. 

For the if part, (i) and (ii) are trivially implied by /' |= — '^j. To 
show (in) and (iv) we observe that a Pareto optimal repair contains ex- 
actly one Pareto optimal (X, y)-cluster for every X-cluster. For clusters 
Cji(^ti)i ■ ■ ■ jCjK^tk) this together with the fact that {R{ti), . . . , R(tk)} c /' 
implies (Hi). For clusters C/j((^^^ ),..., CR(f^-) this together with the fact 
that {R{tk+i), ■ ■ ■ ,R{tm)} n I' = implies (iv). □ 

8 Related work 

We limit our discussion to the work on using priorities to maintain consis- 
tency and facilitate resolution of conflicts. 

The first article to notice the importance of priorities in information sys- 
tems is |15] . There, the problem of conflicting updates in (propositional) 
databases is solved in a manner similar to CRep. The considered priori- 
ties are transitive, which in our framework is too restrictive. Also, in our 
framework this restriction does not bring any computational benefits (the 
reductions can be modified to use only transitive priorities). [8j is another 
example of CRep-like prioritized confiict resolution of first-order theories. 
The basic framework is defined for priorities which are weak orders. A par- 
tial order is handled by considering every extension to weak order. This 
approach also assumes the transitivity of the priority. 

In the context of logic programs, priorities among rules can be used to 
handle inconsistent logic programs (where rules imply contradictory facts). 
More preferred rules are satisfied, possibly at the cost of violating less im- 
portant ones. In a manner analogous to Proposition [3l [25j lifts a total 
order on rules to a preference on (extended) answers sets. When computing 
answers only maximally preferred answers sets are considered. 

A simpler approach to the problem of inconsistent logic programs is pre- 
sented in [19j. There, confiicting facts are removed from the model unless 
the priority specifies how to resolve the conflict. Because only programs 
without disjunction are considered, this approach always returns exactly 
one model of the input program. Constructing preferred repairs in a cor- 
responding fashion (by removing all conflicts unless the priority indicates a 
resolution) would similarly return exactly one database instance (fulfillment 
of VI and P4). However, if the priority is not total, the returned instance 
is not a repair and therefore V5 is not satisfied. Such an approach leads to 
a loss of (disjunctive) information and does not satisfy 7^2 and V3. 
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[9] proposes a framework of conditioned active integrity constraints, which 
allows the user to specify the way some of the conflicts created with a con- 
straint can be resolved. This framework satisfies properties VI and V2 
but not V3 and P4. [9] also describes how to translate conditioned active 
integrity constraints into a prioritized logic program [22] . whose preferred 
models correspond to maximally preferred repairs. 

|21J uses ranking functions on facts to resolve conflicts by taking only 
the fact with highest rank and removing others. This approach constructs a 
unique repair under the assumption that no two different facts are of equal 
rank (satisfaction of 7^4). If this assumption is not satisfied and the facts 
contain numeric values, a new value, called the fusion, can be calculated 
from the conflicting facts (then, however, the constructed instance is not 
necessarily a repair in the sense of Definition [3] which means a possible loss 
of information). 

A different approach based on ranking is studied in [18]. The authors 
consider polynomial functions that are used to rank repairs. When comput- 
ing preferred consistent query answers, only repairs with the highest rank 
are considered. The properties V2 and V5 are trivially satisfied, but because 
this form of preference information does not have natural notions of exten- 
sions and maximality, it is hard to discuss postulates V3 and P4. Also, the 
preference among repairs in this method is not based on the way in which 
the conflicts are resolved. 

An approach where the user has a certain degree of control over the way 
the conflicts are resolved is presented in |17j . Using repair constraints the 
user can restrict considered repairs to those where facts from one relation 
have been removed only if similar facts have been removed from some other 
relation. This approach satisfies VS but not VI. A method of weakening 
the repair constraints is proposed to get VI, however this comes at the price 
of losing V3. 

In [3], Andritsos et al. extend the framework of consistent query an- 
swers with techniques of probabilistic databases. Essentially, only one key 
dependency per relation is considered and user preference is expressed by 
assigning a probability value to each of mutually conflicting facts. The prob- 
ability values must sum to 1 over every clique in the conflict graphs. This 
framework generalizes the standard framework of consistent query answers: 
the repairs correspond to possible worlds and have an associated probabil- 
ity. We also note that no repairs are removed from consideration (unless the 
probability of the world is 0). The query is evaluated over all repairs and 
the probability assigned to an answer is the sum of probabilities of worlds in 
which the answer is present. Although the considered databases are repairs. 
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Figure 8: Summary of complexity results. 



the use of the associated probability values makes it difficult to compare this 
framework with ours. 

9 Conclusions and future work 

In this paper we proposed a general framework of preferred repairs and 
preferred consistent query answers. We also proposed a set of desirable 
properties of a family of preferred repairs. We presented three families of 
preferred repairs: VRep, QRep, and CRep based on different notions of 
optimality of conformance with the priority. For every repair we presented 
a sound and complete database repairing algorithm. Figure [8] summarizes 
the computational complexity results; its first row is taken from |12j. 

We envision several directions for further work. We plan to investigate 
other interesting ways of selecting preferred repairs with priorities. Also, 
extending our approach to cyclic priorities is an interesting and challenging 
issue. Including priorities in similar frameworks of preferences (l7i| leads to 
losing monotonicity. A modified, conditional, version of monotonicity may 
be necessary to capture non-trivial families of repairs. 

Along the lines of [5], the computational complexity results could be 
further studied, by assuming the conformance of functional dependencies 
with BCNF. 

Finally, the class of constraints can be extended to universal constraints [23] . 
This class of constraints allows to express confiicts caused not only by the 
presence of some facts but also by simultaneous absence of other facts. Con- 
flict hypergraphs can be generalized to extended conflict hypergraphs which 
include negative facts. 
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